California Electric Vehicle Registration Analysis

Introduction

In this notebook I will analyze electric vehicle registration in California. My hypothesis is that there will be higher numbers of battery-powered vehicle registration in zip codes with higher median annual incomes. Two datasets were acquired from the California state government, and, below, we will test my hypothesis and further explore what revelations lie within.

Data Explained

Link to first dataset: https://data.ca.gov/dataset/personal-income-tax-statistics-by-zip-code/resource/7091fcca-e695-49ab-aa44-6e0c6f49c9c1

Link to second dataset: https://data.ca.gov/dataset/vehicle-fuel-type-count-by-zip-code/resource/d304108a-06c1-462f-a144-981dd0109900

The first dataset provided data on vehicle registrations, make, and model; as well as type of fuel - including gas, diesel, hybrid, and battery-electric. The second dataset contained data on number of tax filings, annual gross income, total tax liability, etc. Both datasets had columns of data that were unnecessary for the analysis, like make/model, heavy/light duty, model year, etc., so those columns were dropped; and then the datasets were combined into a single frame. Once joined, because the median annual gross income (AGI) was not provided in either set, this was calculated with the AGI per tax filing, per zip code.

See combined dataset (EVsAndIncome.xlsx) and data dictionary below:

Results

The average annual gross income per tax return was calculated by dividing "CA AGI" by "Returns." As shown above, a new column featuring that metric was added to the dataset. Below, the first scatterplot reveals the result of testing the hypothesis that the number of registered electric vehicles (by zip code) would increase as average annual income increased. There is indeed a higher number of electric vehicles as incomes are higher, though perhaps not to the extent expected.

The two scatterplots below illustrate how the number of electric vehicles increases as total tax liabiilty, and annual gross income for a zip code increases. While these are related to the above plot, they show a more dramatic rise in electric vehicles. This could indicate that the overall wealth of a zip code has a more significant effect on the number of battery-powered vehicles than does average income.

The below distribution plot represents totals of electric vehicles as bins, and reveals which bins are most common among zip codes. Each zip code had at least one electric vehicle: notice the x-axis "0" has no bar.

The below heat map reveals the strength and weakness of the correlations between the various types of data. It reinforces the second two scatterplots by showing hues of orange/red representing stronger correlations between battery electric vehicles and average annual income and total tax liability. The light blues to dark blues represent weaker correlations, though still some correlation.

The below code converts the zip codes into counties and the counties into FIPS, so that the data can be mapped to zip codes and for the creation of categorical data for use in a boxplot.

The below code is what creates the mapping of battery electric vehicles to counties in California. The darker orange and red hues indicate higher numbers of electric vehicles, while yellows/greens/blues indicate lower numbers. As illustrated in the map, the highest concentration of electric vehicles is in the San Franciso Bay Area, The Los Angeles County area, and the San Diego area. As we look north and inland, the number of electric vehicles is comparatively low.

The below code creates a barplot for each California county, and illustrates the mean and range and outliers for the number of electric vehicles registered in each county.

Summary

In all, the analysis did indeed reveal a correlation between income in an area, both zip code and county. While there was a positive correlation between average annual income (based on filed tax returns), it was perhaps not as strong as expected. There were stronger positive correlations between the number of registered electric vehicles and total tax liability, and annual gross income, for an entire zip code. This could indicate that wealthier areas have higher rates of electric vehicle adoption. However, further analysis may lead to more predictive results, as the plots here could also indicate a positive correlation between population density and the number of electric vehicles. Population density vs the number of battery-electric vehicle registrations would be my suggestion for future analysis.